Cost-Sensitive Learning in Answer Extraction

نویسندگان

  • Michael Wiegand
  • Jochen L. Leidner
  • Dietrich Klakow
چکیده

One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. This imbalance has a deteriorating effect on the performance of resulting classifiers. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of the class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

Cost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning

Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed. Many such applications also demand varying costs for different types of mis-classification errors, but it is not clear whether or how such cost information can be incorporated into deep learning to improv...

متن کامل

A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data

In this paper, novel cost-sensitive principal component analysis (CSPCA) and cost-sensitive non-negative matrix factorization (CSNMF) methods are proposed for handling the problem of feature extraction from imbalanced data. The presence of highly imbalanced data misleads existing feature extraction techniques to produce biased features, which results in poor classi cation performance especially...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

Feature-Cost Sensitive Learning with Submodular Trees of Classifiers

During the past decade, machine learning algorithms have become commonplace in large-scale real-world industrial applications. In these settings, the computation time to train and test machine learning algorithms is a key consideration. At training-time the algorithms must scale to very large data set sizes. At testing-time, the cost of feature extraction can dominate the CPU runtime. Recently,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008